AITopics | Surabaya

Collaborating Authors

Surabaya

CVQA: Culturally-diverseMultilingual VisualQuestionAnsweringBenchmark

Neural Information Processing SystemsFeb-8-2026, 09:26:38 GMT

Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

Asia > India (0.05)
North America > United States > Michigan (0.05)
Europe > Spain > Galicia > Madrid (0.05)
(33 more...)

Genre:

Overview (0.46)
Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

1568882ba1a50316e87852542523739c-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-9-2025, 19:17:25 GMT

category, dataset, proceedings, (13 more...)

Neural Information Processing Systems

Country:

Asia > India (0.05)
Asia > Philippines (0.04)
North America > United States > Michigan (0.04)
(45 more...)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

MAD: Manifold Attracted Diffusion

Elbrächter, Dennis, Alberti, Giovanni S., Santacesaria, Matteo

arXiv.org Machine LearningSep-30-2025

Score-based diffusion models are a highly effective method for generating samples from a distribution of images. We consider scenarios where the training data comes from a noisy version of the target distribution, and present an efficiently implementable modification of the inference procedure to generate noiseless samples. Our approach is motivated by the manifold hypothesis, according to which meaningful data is concentrated around some low-dimensional manifold of a high-dimensional ambient space. The central idea is that noise manifests as low magnitude variation in off-manifold directions in contrast to the relevant variation of the desired distribution which is mostly confined to on-manifold directions. We introduce the notion of an extended score and show that, in a simplified setting, it can be used to reduce small variations to zero, while leaving large variations mostly unchanged. We describe how its approximation can be computed efficiently from an approximation to the standard score and demonstrate its efficacy on toy problems, synthetic data, and real data.

diffusion model, inference, manifold, (15 more...)

arXiv.org Machine Learning

2509.2471

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York (0.04)
Europe > Italy > Liguria > Genoa (0.04)
Asia > Indonesia > Java > East Java > Surabaya (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Structuralist Approach to AI Literary Criticism: Leveraging Greimas Semiotic Square for Large Language Models

Dong, Fangzhou, Zeng, Yifan, Sang, Yingpeng, Shen, Hong

arXiv.org Artificial IntelligenceJun-27-2025

Large Language Models (LLMs) excel in understanding and generating text but struggle with providing professional literary criticism for works with profound thoughts and complex narratives. This paper proposes GLASS (Greimas Literary Analysis via Semiotic Square), a structured analytical framework based on Greimas Semiotic Square (GSS), to enhance LLMs' ability to conduct in-depth literary analysis. GLASS facilitates the rapid dissection of narrative structures and deep meanings in narrative works. We propose the first dataset for GSS-based literary criticism, featuring detailed analyses of 48 works. Then we propose quantitative metrics for GSS-based literary criticism using the LLM-as-a-judge paradigm. Our framework's results, compared with expert criticism across multiple works and LLMs, show high performance. Finally, we applied GLASS to 39 classic works, producing original and high-quality analyses that address existing research gaps. This research provides an AI-based tool for literary research and education, offering insights into the cognitive mechanisms underlying literary engagement.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2506.2136

Country:

Asia > Indonesia > Java > East Java > Surabaya (0.05)
Oceania > Australia > Queensland (0.04)
Asia > China > Beijing > Beijing (0.04)
(6 more...)

Genre: Research Report > Experimental Study (0.34)

Industry:

Leisure & Entertainment (0.68)
Media (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

LLM Distillation for Efficient Few-Shot Multiple Choice Question Answering

Sutanto, Patrick, Santoso, Joan, Setiawan, Esther Irawati, Wibawa, Aji Prasetya

arXiv.org Artificial IntelligenceDec-30-2024

Multiple Choice Question Answering (MCQA) is an important problem with numerous real-world applications, such as medicine, law, and education. The high cost of building MCQA datasets makes few-shot learning pivotal in this domain. While Large Language Models (LLMs) can enable few-shot learning, their direct application in real-world scenarios is often hindered by their high computational cost. To address this challenge, we propose a simple yet effective approach that uses LLMs for data generation and scoring. Our approach utilizes LLMs to create MCQA data which contains questions and choices, and to assign probability scores to the generated choices. We then use the generated data and LLM-assigned scores to finetune a smaller and more efficient encoder-only model, DeBERTa-v3-base by leveraging distillation loss. Extensive experiments on the Massive Multitask Language Understanding (MMLU) benchmark demonstrate that our method improves accuracy from 28.9% to 39.3%, representing a gain of over 10% compared to a baseline finetuned directly on 5-shot examples. This shows the effectiveness of LLM-driven data generation and knowledge distillation for few-shot MCQA.

dataset, generation method, llm, (13 more...)

arXiv.org Artificial Intelligence

2412.09807

Country:

Europe > United Kingdom > England (0.14)
Asia > Indonesia > Java > East Java > Surabaya (0.04)
North America > United States > Virginia (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Curriculum > Subject-Specific Education (1.00)
Education > Health & Safety > School Nutrition (0.93)
Health & Medicine > Consumer Health (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Assessing the Human Likeness of AI-Generated Counterspeech

Song, Xiaoying, Mamidisetty, Sujana, Blanco, Eduardo, Hong, Lingzi

arXiv.org Artificial IntelligenceDec-15-2024

Counterspeech is a targeted response to counteract and challenge abusive or hateful content. It effectively curbs the spread of hatred and fosters constructive online communication. Previous studies have proposed different strategies for automatically generated counterspeech. Evaluations, however, focus on relevance, surface form, and other shallow linguistic characteristics. This paper investigates the human likeness of AI-generated counterspeech, a critical factor influencing effectiveness. We implement and evaluate several LLM-based generation strategies, and discover that AI-generated and human-written counterspeech can be easily distinguished by both simple classifiers and humans. Further, we reveal differences in linguistic characteristics, politeness, and specificity. The dataset used in this study is publicly available for further research.

counterspeech, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2410.11007

Country:

North America > United States > Texas (0.14)
North America > United States > Arizona (0.04)
Asia > Indonesia > Java > East Java > Surabaya (0.04)
Asia > India (0.04)

Genre:

Research Report > New Finding (0.88)
Research Report > Experimental Study (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Communications > Social Media (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

On the Cost of Model-Serving Frameworks: An Experimental Evaluation

De Rosa, Pasquale, Bromberg, Yérom-David, Felber, Pascal, Mvondo, Djob, Schiavoni, Valerio

arXiv.org Artificial IntelligenceNov-15-2024

In machine learning (ML), the inference phase is the process of applying pre-trained models to new, unseen data with the objective of making predictions. During the inference phase, end-users interact with ML services to gain insights, recommendations, or actions based on the input data. For this reason, serving strategies are nowadays crucial for deploying and managing models in production environments effectively. These strategies ensure that models are available, scalable, reliable, and performant for real-world applications, such as time series forecasting, image classification, natural language processing, and so on. In this paper, we evaluate the performances of five widely-used model serving frameworks (TensorFlow Serving, TorchServe, MLServer, MLflow, and BentoML) under four different scenarios (malware detection, cryptocoin prices forecasting, image classification, and sentiment analysis). We demonstrate that TensorFlow Serving is able to outperform all the other frameworks in serving deep learning (DL) models. Moreover, we show that DL-specific frameworks (TensorFlow Serving and TorchServe) display significantly lower latencies than the three general-purpose ML frameworks (BentoML, MLFlow, and MLServer).

machine learning, natural language, torchserve, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IC2E61754.2024.00032

2411.10337

Country:

Europe > Switzerland > Neuchâtel > Neuchâtel (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > France > Brittany > Ille-et-Vilaine > Rennes (0.04)
(7 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Time and Frequency Synergy for Source-Free Time-Series Domain Adaptations

Furqon, Muhammad Tanzil, Pratama, Mahardhika, Shiddiqi, Ary Mazharuddin, Liu, Lin, Habibullah, Habibullah, Dogancay, Kutluyil

arXiv.org Artificial IntelligenceOct-22-2024

The issue of source-free time-series domain adaptations still gains scarce research attentions. On the other hand, existing approaches rely solely on time-domain features ignoring frequency components providing complementary information. This paper proposes Time Frequency Domain Adaptation (TFDA), a method to cope with the source-free time-series domain adaptation problems. TFDA is developed with a dual branch network structure fully utilizing both time and frequency features in delivering final predictions. It induces pseudo-labels based on a neighborhood concept where predictions of a sample group are aggregated to generate reliable pseudo labels. The concept of contrastive learning is carried out in both time and frequency domains with pseudo label information and a negative pair exclusion strategy to make valid neighborhood assumptions. In addition, the time-frequency consistency technique is proposed using the self-distillation strategy while the uncertainty reduction strategy is implemented to alleviate uncertainties due to the domain shift problem. Last but not least, the curriculum learning strategy is integrated to combat noisy pseudo labels. Our experiments demonstrate the advantage of our approach over prior arts with noticeable margins in benchmark problems.

artificial intelligence, domain adaptation, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2410.17511

Country:

Oceania > Australia > South Australia (0.04)
Asia > Indonesia > Java > East Java > Surabaya (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Using LLM for Real-Time Transcription and Summarization of Doctor-Patient Interactions into ePuskesmas in Indonesia

Irfan, Azmul Asmar, Khatim, Nur Ahmad, Arief, Mansur M.

arXiv.org Artificial IntelligenceSep-25-2024

One of the key issues contributing to inefficiency in Puskesmas is the time-consuming nature of doctor-patient interactions. Doctors need to conduct thorough consultations, which include diagnosing the patient's condition, providing treatment advice, and transcribing detailed notes into medical records. In regions with diverse linguistic backgrounds, doctors often have to ask clarifying questions, further prolonging the process. While diagnosing is essential, transcription and summarization can often be automated using AI to improve time efficiency and help doctors enhance care quality and enable early diagnosis and intervention. This paper proposes a solution using a localized large language model (LLM) to transcribe, translate, and summarize doctor-patient conversations. We utilize the Whisper model for transcription and GPT-3 to summarize them into the ePuskemas medical records format. This system is implemented as an add-on to an existing web browser extension, allowing doctors to fill out patient forms while talking. By leveraging this solution for real-time transcription, translation, and summarization, doctors can improve the turnaround time for patient care while enhancing the quality of records, which become more detailed and insightful for future visits. This innovation addresses challenges like overcrowded facilities and the administrative burden on healthcare providers in Indonesia. We believe this solution will help doctors save time, provide better care, and produce more accurate medical records, representing a significant step toward modernizing healthcare and ensuring patients receive timely, high-quality care, even in resource-constrained settings.

indonesia, summarization, transcription, (16 more...)

arXiv.org Artificial Intelligence

2409.17054

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > Indonesia > Java > Jakarta > Jakarta (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

ControlCity: A Multimodal Diffusion Model Based Approach for Accurate Geospatial Data Generation and Urban Morphology Analysis

Zhou, Fangshuo, Li, Huaxia, Hu, Rui, Wu, Sensen, Feng, Hailin, Du, Zhenhong, Xu, Liuchang

arXiv.org Artificial IntelligenceSep-25-2024

Volunteer Geographic Information (VGI), with its rich variety, large volume, rapid updates, and diverse sources, has become a critical source of geospatial data. However, VGI data from platforms like OSM exhibit significant quality heterogeneity across different data types, particularly with urban building data. To address this, we propose a multi-source geographic data transformation solution, utilizing accessible and complete VGI data to assist in generating urban building footprint data. We also employ a multimodal data generation framework to improve accuracy. First, we introduce a pipeline for constructing an 'image-text-metadata-building footprint' dataset, primarily based on road network data and supplemented by other multimodal data. We then present ControlCity, a geographic data transformation method based on a multimodal diffusion model. This method first uses a pre-trained text-to-image model to align text, metadata, and building footprint data. An improved ControlNet further integrates road network and land-use imagery, producing refined building footprint data. Experiments across 22 global cities demonstrate that ControlCity successfully simulates real urban building patterns, achieving state-of-the-art performance. Specifically, our method achieves an average FID score of 50.94, reducing error by 71.01% compared to leading methods, and a MIoU score of 0.36, an improvement of 38.46%. Additionally, our model excels in tasks like urban morphology transfer, zero-shot city generation, and spatial data completeness assessment. In the zero-shot city task, our method accurately predicts and generates similar urban structures, demonstrating strong generalization. This study confirms the effectiveness of our approach in generating urban building footprint data and capturing complex city characteristics.

building footprint, controlcity, morphology, (16 more...)

arXiv.org Artificial Intelligence

2409.17049

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > China > Beijing > Beijing (0.05)
Asia > Indonesia > Java > Jakarta > Jakarta (0.05)
(18 more...)

Genre: Research Report > New Finding (0.67)

Industry: Transportation > Ground > Road (0.55)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback